Area and Performance Optimization of Barrier Synchronization on Multi-core Network-on-Chips

نویسندگان

  • Xiaowen Chen
  • Shuming Chen
  • Zhonghai Lu
  • Axel Jantsch
چکیده

Barrier synchronization is commonly and widely used to synchronize the execution of parallel processor cores on multi-core Network-on-Chips (NoCs). Since its global nature may cause heavy serialization resulting in large performance penalty, barrier synchronization should be carefully designed to have low latency communication and to minimize overall completion time. Therefore, in the paper, we propose a fast barrier synchronization mechanism, targeting Multi-core NoCs. The fast barrier synchronization mechanism includes a dedicated hardware module, named Fast Barrier Synchronizer (FBS), integrated with each processor node. It offers a set of barrier counters and can concurrently process synchronization requests issued by the local node and remote nodes via the on-chip network. The salient feature of our fast barrier synchronization mechanism is that, once the barrier condition is reached, the “barrier release” acknowledgement is routed to all processor nodes in a broadcast way in order to save chip area by avoiding storing source node information and to minimize completion time by avoiding serialization of barrier releasing. Synthesis results suggest that the FBS can run over 1 GHz in SMIC 130nm technology with small area overhead. We implemented a FBS-enhanced multi-core NoC architecture on our FPGA platform using the Xilinx Virtex 5 as the FPGA chip. FPGA utilization and simulation results show that our fast barrier synchronization demonstrates both area and performance advantages over the barrier synchronization counterpart with unicast barrier releasing. Keywords-Barrier Synchronization; Multi-core; Network-on-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Control Synchronization on Multi-Story Structure under Earthquake Loads and Random Forces using H∞ Algorithm

In this paper‎, ‎the concept of synchronization control along with robust H∞ control are considered to evaluate the seismic response control on multi-story structures‎. ‎To show the accuracy of the novel algorithm‎, ‎a five-story structure is evaluated under the EL-Centro earthquake load‎. ‎In order to find the performance of the novel algorithm‎, ‎random and uncertainty processes corresponding...

متن کامل

Improving Performance of Collection - Oriented Operations through Parallel Fusion

To more fully utilize the potential offered by multi-core processors, programming languages must have features for expressing parallelism. One promising approach is collection-oriented operations, which are easily expressed by the programmer and can be implemented by the runtime system in a parallel fashion for improved performance. However, the ordinary implementation requires a barrier synchr...

متن کامل

Application Mapping onto Network-on-Chip using Bypass Channel

Increasing the number of cores integrated on a chip and the problems of system on chips caused to emerge networks on chips. NoCs have features such as scalability and high performance. NoCs architecture provides communication infrastructure and in this way, the blocks were produced that their communication with each other made NoC. Due to increasing number of cores, the placement of the cores i...

متن کامل

Operator Fusion in a Data Parallel Library

To more fully utilize the potential offered by multi-core processors, programming languages must have features for expressing parallelism. One promising approach is collection-oriented operations, which are easily expressed by the programmer and can be implemented by the runtime system in a parallel fashion for improved performance. However, the ordinary implementation requires a barrier synchr...

متن کامل

Heterogeneous Networks of Workstations across Wide Area Networks Be Accepted in Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Computer Engineering

Networks made up of various systems are scattered across wide area networks and together they contribute to the heterogeneous environment of the computational grid. Whilst they are an immense source of computing resource, the core weakness of connecting these networks blindly together is that they are made up of various network link speeds. Bottlenecks in communications occur due to the varied ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010